Picture for Hu Wei

Hu Wei

BilliardPhys-Bench: Benchmarking Physical Reasoning and Visual Dynamics of Multimodal LLMs

Add code
May 29, 2026
Viaarxiv icon

CrystalXRD-Bench: Benchmarking Vision-Language Models for XRD Peak Indexing Across Diverse Crystalline Materials

Add code
May 28, 2026
Viaarxiv icon

DMC-CF: Dynamic Multimodal CounterFactual QA benchmark for Causal Reasoning

Add code
May 28, 2026
Viaarxiv icon

Qwen-Image-Bench: From Generation to Creation in Text-to-Image Evaluation

Add code
May 27, 2026
Viaarxiv icon

Qwen-Image-2.0 Technical Report

Add code
May 11, 2026
Viaarxiv icon

MAS-Algorithm: A Workflow for Solving Algorithmic Programming Problems with a Multi-Agent System

Add code
May 07, 2026
Viaarxiv icon

Architectural Design Decisions in AI Agent Harnesses

Add code
Apr 20, 2026
Viaarxiv icon

From Agent Loops to Structured Graphs:A Scheduler-Theoretic Framework for LLM Agent Execution

Add code
Apr 13, 2026
Viaarxiv icon

FeynmanBench: Benchmarking Multimodal LLMs on Diagrammatic Physics Reasoning

Add code
Apr 04, 2026
Viaarxiv icon

IndustryCode: A Benchmark for Industry Code Generation

Add code
Apr 03, 2026
Viaarxiv icon